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Fifty-eight aiddle class children were tested over 6 
years with 25 achieve«ent, I.Q*, and personality tests. Consistency 
if test results were evaluated by a variance coaparison »e^^od and a 
siMple signal detection lodel. Both lethods.lead to the conclusion 
that achleveient tests are far better predictors than personality 
tests with I.Q. scales placing in between, (Author) 
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OM THE tKSTXl^G OF TESTf* : 
A LOWGlTUinKAL STUDV 

Univerctitc din Hontreal 

The purpose of the pvPFent study was twofold : (1) to evaluate the 
consi^ency of a nimbcr cf standard tests used to measura cognitive, perceptual- 
motor » achievetnent ami personality varlablos in children and (2) to assess the 
degree to Aohich Inconsistency of a test can b. attributed to uneven dovelopmental 
growth m children. Mo ain^oa at conceptu^^ny simple constructs to derive the 
predictive value of te^^^ts during the d.evel operant of normal school children, vitb 
particutni* etnphasiji on iong-tRrm achievRmcnt. ^ 

SimrcTS; Ss were the 103 children of the Kii;derRartcn ^rade fro:,> tyro Ko;ivtrfal _ 
suburban schonlr.; 52 v,.re boys and 51 were girls, ^.e schools vere ael noted to 
represent a typical niddle class bnckground: The median interval of ann-iAl frm^y 
income in the sample ^-.a. $7,500 to $10,000 (19^S4). W of the fathers had attcnted 
at least one year of university, 32% had finiched iliph School and 2BZ had not co^.- 
pleted their IMch School education. Socio-economic distribution vac^ as follows: 
19% were ptof essionals, 36% held sales and clerical job*. 177. vere self -onployod, 
in occupied administrative posts. The r^.n.ininr, 17% w^re placed in the miscel- 
laneous category, 

;\ 

Testiny^ Procodrre 

The children were foUoved over 6 years, fns(n Kindergarten throueh 
Grade 5 (196^-1970). Depending on the test, the children were cither individual- 
ly tested or neen as a firoup at yearly intervals by two psychotcchniclans. Achie- 
vement Tests for Rc/.dlnB, Arithmetic and Lan^iunfie, were adminlcteved in the 10th 
"niowth of each Grade, Scoring of tests was supervised by a psychologist. 



The fonowln t^ is the H °«- ^'^'^♦'^ in the. Bttidy; 

1. LincolTi"Oseret?.ky Motor Dovdlopment Scale. 

2. Goodenoagb-lUiris DrGV-n«TOAn~test. 

3. Gooden.UBh.Harvls D.a«.u.vo,„a„-tes.. ^^^^^^ 

A, Wise rcrformance Scale. 

5, Wise Verbal Scale. 

6, AIISC Full Scale. 

7, Lorge-'rhorndike Group Intelligence Scale. 

8, PlQget test of causal and operational thinkins (Total score) 

9, - 21. Cattell's^'Chlldron's Personality Questionnaire, Scales A to Q4, 
/ 22. California AVhicvenient Test : Reading. 

23. California Achievement Test : Arithmetic, 

24. California Achievement Test : Language, 

25. California Achlovement Test : Total score. 

Attrition rate for the group between Kindergarten and Grade 5 was 357.» 
with 67 children remaining by Grade 5 in the sample. For an analysis of test sta- 
bility from year to year only those children for whom data were complete for all 
years were included. With this constraint the final sample consisted of only 
58 children but the intra-group variaHility was not distorted by cxtrnneoivr. sub^ 
jects. 

INTRi\-TEST STABILITY ANALYSIS 

Previous longitudinal research has shown that test measures, including 
I.Q. tests, tend to Increase over the years. The present study confirmed this 
trend: groups showed an average gain of 10 to 20 percent depending on the test. 
Thus, a child who maintained the same score over the y^&^TZ^fi in fact losing points 
if the group mean had increased. Psate of developmental change was evaluated in 
relation to the child's group, and therefore standard scores were. used as the ba- 



sis for a quatltative meafture of rate of chnnge. ^^^^ ^^^^ AVAIUIBIE 

Change iT> a standavd score reflects a chanse i-^ ^he child's relative 
BtandlTvg iTi the group. Differences in standard ecores for eaph child, from year 
to year, provide a measure of his mobility within that group. Snxmlm tHeee 
differences in Z scores over the years results in a value x.hich is nuriorlcally Iden- 
tical to the difference between the first and ^t measures, llmt is, the diffe- 
rence between the first and last measures represents the total amount of change 
for a particular child (relative to his group). In order to asses the amount of ^ 
movement within the group as a whole, one is te.upted to take the mean of those d^- 
ferences. Hov:ever, due to the fact that the measurements are in Z scores, the me^O 
of these differences will be zero. The sum of squared differences, divided by N. 
will give the desired quantity.'^ It is easily shown that this is a between subject 
variance and represents average intra-group mobility. This variance was used to 
discriminate between tests. High variance within a test over the years signifies 
a great deal of instability and the test will be a poor predictor. 

Tlitrc remains the problem of how to differentiate test instability due 
to poor test construction from instability due to the idiosyncratic variAbility 
of individual children. Since .the sau.ples for different tests were not always 
made up of the same. Ss it was possible that some* tests fared badly because they 
were plagued with highly unstable children. Ue ey.prcssed the inconsisUonry of the 
individual child as the difference of the standard scores frtm year to year and 
calculated the variance of tliosc difference scores for each of the 27 t.^sts. These 
variances are diroctly cornparable and measure the extent to which Subject variabi- 
lity contributes to the uncertainty in longterm predictions from these tests. 
Thus comparatively high variability in soue tests cannot bo attributed to the Ss 
if th« same children show erratic scores in Just those tests while reuaining quite 
consistent in others, by the sann toU.n, tests wlvich inc^^ude sizeable proportions 
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Of children with erratic scores throuBhout mny be excused for not prcdictlns 



better than they do. 

RESULTS > 



rigure 1 describes the relative stability of tests as the between 
Ss variance calculated fron, the difference of standard scores over the span of 
5 or 6 years. The lower the variance the greater the stability of the test. It 
appears that the achievement tests show the stronBesI: predictive power. The total 
achievement scsixiion the California Achievement Test {425) has a variance of .23 
-mtST^^Z^ty^Q aveiafee displacement to^e expected of Ss taking this test is less 
than 1/2 SD ( ' .23 «.48). The three achieveKient scales of the CAT» reading, arith- 
metic and lanBUage (#22-24), as well as the intelliBence tests (#4-8) and the 
motor tests (#1, 2) with the exception of the Goodenoush-Uarris (woman) test (#2), 
have a ccmpntabie RtnbxUry indejc oi about U, meuninu ihnl ihe. uv^tui^.-. ainplac^- 
ment of Ss within the group did not exceed one SD. Tlie predictive power of per- 
sonality t^>sts (#9-21). w>ere subject's standing in the Broup over successive 
years changes a great deal, is weak. 

This conclusion is supported by an analysis of the degree to which in- 
dividual children show test variabUlty over classes of tests. Taking again the 
difference in/lOlalues between the first and the last year as a score and calcu- 
lating the vaMab^ity of these scores for each S over the four groups of tests, 

Ik 

only one child in 29 ( 37.) exceeded a variance of I In the achievement tests, 15 
out of 58 children (267J had vtiriances greater than 1 in the motor and intelli- 
gence tests, but 34 out of 43 children (797.) exceeded this value in the personali- 
ty tests. 

Therefore the instability encountered in thf-se latter tests seems to be 
due largely to the poor caracteristlcs of the tests rather than the ideosyncratlc 



BEST COPY mmit 

variability in iho children. Tho fact that soma scnles are better than otlvers 
0.8. no, versus (two factors on th« Gattell Scale), cannot obscure the 

flndlnB that the pernonality testa as a whole cannot be interpreted with the same 
decree of confidence as the other tests, 

wcT rn}j<^i<irnx.W OF mOXQ a, ASSIFl CATIONS 

So far the evaluation of test stability was based on comparisons of 
variances derived frotn differences of Z scores. For the practitioner it roifiht 
be of Ereatcr value to know whether tho Broupings and distinctions he makes on 
the basis of his test results arc reasonably consistent ovtr the long run. 

We therefore applied a second method, non-parametric in forn, derived 
from a signal ^detection model, vhich has the appeal of using empirical concepts 
likely to be encountered in practice. 

Suppose test results at school entrance exi^minntion worp used to form 
classes of children with special proorams. Let Che arbitrary cla«« boundary be 
^ne SD. Thus children scorine I SD or raore above the mean would go to an accele- 
rated program, those*scoring I SD or worse below the mean would receive auxiliary 
training and the bulk of 68% .,'ould be dcvided by the mcafi into an above avcvage 
and a below average group. The question asked by the praVtitloner is how many of 
the children thus classified would still turn up in tho original group 5 or 6 years 
later. In terms of the signal detection modclt . How many "hits" did the test sco- 
re? If a child turned up two or more categories removed from its original clas- 
Biflcation it surp.ly was a "bad misolacc.nent". Using a rather strict criterion 
for hits but a larger one for mlsplLcments takes Into account the graded scvcri. 
ty of conse^tucncec for misses. Presutneably less harm is done if miflclassif Icatlon 
is by only one category. The proportion of correct predictions and bad misplace- 
monts were calculated for each test. Again only Ss participating in all test 
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scores over 5 or 6 yo^irft, dependinc on the test, were included tho sample. 

Figure 2 shows the out<;o!ae of the analysis with the slBnnl detection 
«odel. The tests are plotted in terms of^ correct predictions and bad misplace, 
inents (i.e. more than two catesories removed frora^e initial placciaent). TJte 
graph can be read by dividing the plot into 4 quadrants. Let one third be the mi- 
ntmm acceptable co-rect prediction and one in 5 be the maKlnmm tolerable rate of 
bad misplacements (.33 and .20 on the Y and X ^xes reapectively) / then quadrant n 
contains the tests with the absolute best performance both In terms of hlgli number 
of hits and small number of bad mistakes. Quadrant IV points out the tests with 
all around wrong predictions, while quadrants I and HI contain those ambifious test 
performances with cither not enough usable correct predictions or too many mispla- 
cements. Achievement tests, motor - and intelligence tests accumulate in quadrant 
11 whll« tn« pPrKonallty ti^ats fall into quadrant or into the ambiQuouxi and un- .; 
aatlsfactory categories. The close correspondence of the tiTO modes of analysis can 
be taken as an indication that the rather complex numerical analysis involving trans- 
formations and laborious searches and matchings can be bypassed by the rather sim- 
ple counting procedure necessary to build the signal detection model. 

In conclusion, it might be said that while intelligence, achievement and 
perceptual motor tests appear to measure relatively stable dimensions of functioning, 
personality tests, at least of the inventory type, present serious problems for pre- 
dictive purposes. Reasons for this are undoubtedly complex. It is possible to say 
that personality, partlculary In normal children, is not yet crystallized and is 
jonstantly emerging. Therefore gtablo measures should not be expected. This con- 
clusion contradicts psychoanalytic theory which may be referring to more basic per- 
sonality structures. Tests of the CPQ typo probably do not reach this level. Wha- 
tever these tests are measuring would seem to bo closer to the more fluctuating traits 
which vary from situation to situation and from year to year. 
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ABSTRACT 



/ ^ 58 middle class children were tested . 6 years with 25 achlevment, 

LQ. , and personality teets^ Consisted of test rctjults were evaluated by a 
varCance comparison method and a Bimple signal detection tnodel. Both methods 
lead to the conclusion that achievement rests are far better prcdictlors than 
personality tests with l.Q. scales placing in between. 
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